Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add call to HandleParticlesAtBoundaries in WarpXEvolve.cpp to prevent oob segfaults #5136

Closed
wants to merge 3 commits into from

Conversation

archermarx
Copy link
Contributor

Fixes bug found here #5065 and partially addresses #5113

This PR just adds a call to HandleParticlesAtBoundaries to the synchronization step part of WarpXEvolve.cpp to prevent unhelpful memory access errors when particles travel too far in one step.

Before, this error would be thrown:

amrex::Abort::3::CUDA error 700 in file /home/marksta/src/warpx/build/_deps/fetchedamrex-src/Src/Base/AMReX_GpuDevice.cpp
line 648: an illegal memory access was encountered !!!
SIGABRT

Now, no error is thrown and the simulation finishes successfully.

Ideally, we would still like to warn users about CFL violations in ES simulations, or adapt the timestep to account for a user-specified CFL number (which may be greater than 1 for semi-implicit schemes). I'll keep looking at that.

@archermarx archermarx requested a review from ax3l August 14, 2024 15:44
@archermarx
Copy link
Contributor Author

Looks like the qed_schwinger and Point_of_contact_EB_3d FAILED tests fail, but only by a bit (errors on order of 0.01 relative tolerance vs the requested 0.001). I could wrap this line in #if not defined(WARPX_QED) || not defined (WARPX_EB), perhaps

Also, the dsmc_1d test fails now, with relative errors of O(1e-8) as opposed to 1e-9.

@ax3l
Copy link
Member

ax3l commented Aug 15, 2024

I think one could reset these checksums. The test changes the order of particles that go into the checksum by redistributing them. Long floating point sums are not commutative.

@archermarx
Copy link
Contributor Author

How could I go about doing that?

@@ -176,6 +176,8 @@ WarpX::Evolve (int numsteps)
if (evolve_scheme == EvolveScheme::Explicit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a more correct way of fixing this is to move this whole block of code to the end of the time step after the electrostatic field calculation (and before the diagnostics). This is suggested in the ancient PR #1751. This fixes two things. At that point the particle boundary conditions would have been handled anyway, and also the fields updated to the correct time level for the velocity push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that seems reasonable.

Copy link
Member

@dpgrote dpgrote Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this causes many CI tests to fail. I can work through the tests and update the benchmarks, making sure that nothing is actually broken. This will take some time.

@archermarx
Copy link
Contributor Author

Closing this for now. May create a fresh PR in the future.

@archermarx archermarx closed this Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants